Generic Discrimination: Partitioning and Sorting of Complex Data in Linear Time

نویسنده

  • Fritz Henglein
چکیده

We introduce the notion of discrimination, which is a generalized form of partitioning, and present an expressive term language for defining equivalence relations on complex data. The language allows definition of equivalence relations by freely combining structural equivalence, equivalence of lists under commutativity, idempotence (bag and set equivalence), and more. We then show that worst-case linear-time discriminators can be defined generically, by induction on the term language, using multiset discrimination. By employing discriminators for base types such as characters and integer segments that sort their inputs, it can be shown that the inductive construction yields discriminators that both partition and sort their input in linear time for a wide range of total preorders. This amounts to generically bootstrapping pigeonhole sorting for a finite segment of primitive data to linear-time sorting of complex data. We show how these discriminators, both sorting and nonsorting, can be coded up compactly and elegantly using Generalized Algebraic Data Types (GADTs) and list comprehensions and give some examples of applications of the use of discriminators. Finally, we argue that discrimination should replace equality testing as a language primitive for built-in types and abstract types that wish to only make equality observable: they algorithmically generalize equality testing, which is basically just discrimination of 2 elements. Discriminators allow partitioning and even sorting of arbitrary size lists in linear time without comparison operations (as in comparison-based sorting) or arithmetization of the values (as for hash-based methods). Thus even references in ML could be discriminated in linear time instead of quadratic time, if discrimination were the built-in operation for exposing reference equality, not equality testing. 1 Partitioning and discrimination Definition 1 (Equivalence, Partitioner, Discriminator). An equivalence (T,Eq) is a type T together with a binary relation Eq ⊆ T × T that is reflexive, symmetric and transitive. A partitioner for equivalence (T,Eq) is a function PartEq : [T ]→ [[T ]] that takes a list of elements and partitions them according to equivalence Eq. We call each element of the output a block (which is a list itself). More precisely, we have:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic top-down discrimination for sorting and partitioning in linear time

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on representations of ordering and equivalence relations. Discriminators improve the asymptotic performance of generic comparison-based sorting and partitioning, and can be implemented not to ex...

متن کامل

Generic Top-down Discrimination∗

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence relations, respectively. Discriminators improve the asymptotic performance of generic comparison-bas...

متن کامل

Sorting and Searching by Distribution: From Generic Discrimination to Generic Tries

A discriminator partitions values associated with keys into groups listed in ascending order. Discriminators can be defined generically by structural recursion on representations of ordering relations. Employing type-indexed families we demonstrate how tries with an optimaltime lookup function can be constructed generically in worst-case linear time. We provide generic implementations of compar...

متن کامل

Discrimination of time series based on kernel method

Classical methods in discrimination such as linear and quadratic do not have good efficiency in the case of nongaussian or nonlinear time series data. In nonparametric kernel discrimination in which the kernel estimators of likelihood functions are used instead of their real values has been shown to have good performance. The misclassification rate of kernel discrimination is usually less than ...

متن کامل

Optimizing relational algebra operations using generic partitioning discriminators and lazy products∗

We show how to implement in-memory execution of the core relational algebra operations of projection, selection and cross-product efficiently, using discrimination-based joins and lazy products. We introduce the notion of (partitioning) discriminator, which partitions a list of values according to a specified equivalence relation on keys the values are associated with. We show how discriminator...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007